Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix for #16747 #16781

Merged
merged 1 commit into from
Nov 12, 2019
Merged

Fix for #16747 #16781

merged 1 commit into from
Nov 12, 2019

Conversation

ptrendx
Copy link
Member

@ptrendx ptrendx commented Nov 12, 2019

Description

Fixes #16747.

This PR fixes 2 problems:

  • the name for the fused op on the backward pass contained names of nodes in the forward pass, sometimes making the name excessively long. This happened because backward nodes had control dependencies to corresponding forward nodes and DFSVisit (used to generate the FusedOp name) traverses those edges as well.
  • under some circumstances (including the ones in Fused Op causes MXNetError #16747) backward node inside a FusedOp could have multiple nodes listed in control dependencies (instead of just a single one - the corresponding forward node). FusedOp would then take all of them and treat them as nodes required for type and shape inference. If any of such unnecessary nodes was another backward node that was subsequently fused, such control dependency would be changed into a Helper node, which does not exist in the IndexedGraph, causing an error during ProvideAttrToFusion call. This PR changes the behavior so that only the first control dependency (real forward node) is taken into account when constructing control dependencies of the fusion node.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@marcoabreu
Copy link
Contributor

Shall we add some test since our CI didn't catch these cases?

@ptrendx
Copy link
Member Author

ptrendx commented Nov 12, 2019

@marcoabreu I would like to, but frankly it seems to be a very specific scenario when this failure could happen and I did not find any small example exposing the issue, was working on the repro from #16747.

@marcoabreu
Copy link
Contributor

Ok

@marcoabreu marcoabreu merged commit e3f5dad into apache:master Nov 12, 2019
ptrendx added a commit to ptrendx/mxnet that referenced this pull request Nov 15, 2019
ptrendx added a commit that referenced this pull request Nov 16, 2019
…, #16792) (#16832)

* Fix nightly build (#16773)

* Remove dependency on tvmop.conf

* Fix binaries dependencies for ni nightly

* Add comments

* Update tvmop.py

* Fix rebase

* Fix (#16781)

* Speed fused_op compilation by caching ptx and jit-compiled functions (#16783)

* [Numpy] Fix collect_params().zero_grad() in gluon numpy interface (#16716)

* fix zero_grad

* Update parameter.py

* add test

* fix

* Mixed data type binary ops (#16699)

* support mixed-precision binary operations

* improvement for documentations and error messages

* Support boolean elemwise/broadcast binary add, multiply and true_divide (#16728)

* support pure boolean elemwise/broadcast binary op

* switch to unique_tpr

* fix the test error

* Fix rtrue_divide grad (#16769)

* Fix rtrue_divide_scalar

* More tests

* Fix numpy-compatible mean output type for integer inputs (#16792)

* fix mean output type for integer inputs

* enable for windows
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fused Op causes MXNetError
2 participants